Disable MLP Fused Ops if Not SwiGLU, Depracted Fast Quantized Peft Plugin, Update Benchmarks #106

fabianlim · 2024-11-08T05:36:27Z

This PR

disables the MLP Fused Ops if the activation function is not SwiGLU. In this implementation, the rules are generated upfront before checking what model is activated.
fast_quantized_peft is now removed to prevent further confusion
Update the benchmarks with all the new updates.

Updated Benchmarks

Outliers

Generally we noticed two things

THe 70B will now OOM with per device batch size 4. This could be due to some changes, and the benches were hovering around 70+G peak util. This is similar to Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50, and we will track this there
Noticed that gpt_bigcode got slower with FSDP. The reason for the slowdown is unclear. It seems to be only affecting this case. This is tracked in Slow down observed for BigCode Santa Coder #110

outliers.csv

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

Base automatically changed from fix/lora-drop to main November 8, 2024 11:00

fabianlim added 3 commits November 8, 2024 11:06

disable MLP fused op for non-silu, and removed all qpeft plugin

c8aa144

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fix the filter drops rule

de15a0f

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fix all models

9cf8f65

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim force-pushed the fix/swiglu branch from 714b87a to 9cf8f65 Compare November 8, 2024 11:07

fix

bd7528d

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim mentioned this pull request Nov 10, 2024

feat: add liger kernel with fused cross entropy loss #93

Merged

accurately set trl in bnb qpeft fix and file rename

c9057c3

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim changed the title ~~Disable MLP Fused Ops if Not SwiGLU and removed Depracted Fast Quantized Peft Plugin~~ Disable MLP Fused Ops if Not SwiGLU, Depracted Fast Quantized Peft Plugin, Update Benchmarks Nov 10, 2024

update bench

61078ab

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim requested a review from anhuong November 10, 2024 13:29

fabianlim mentioned this pull request Nov 13, 2024

Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50

Open

Merge remote-tracking branch 'origin/main' into fix/swiglu

d55a638

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim merged commit 9239802 into main Nov 14, 2024
7 checks passed

fabianlim deleted the fix/swiglu branch November 14, 2024 01:48

fabianlim mentioned this pull request Jan 3, 2025

Decouple Filter MP Rules function from cuda imports #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable MLP Fused Ops if Not SwiGLU, Depracted Fast Quantized Peft Plugin, Update Benchmarks #106

Disable MLP Fused Ops if Not SwiGLU, Depracted Fast Quantized Peft Plugin, Update Benchmarks #106

fabianlim commented Nov 8, 2024 •

edited

Loading

Disable MLP Fused Ops if Not SwiGLU, Depracted Fast Quantized Peft Plugin, Update Benchmarks #106

Disable MLP Fused Ops if Not SwiGLU, Depracted Fast Quantized Peft Plugin, Update Benchmarks #106

Conversation

fabianlim commented Nov 8, 2024 • edited Loading

Updated Benchmarks

fabianlim commented Nov 8, 2024 •

edited

Loading